Skip to content

BLAS vector-vector dot#294

Merged
abergeron merged 16 commits intoTheano:masterfrom
khaotik:blas_dot
Nov 29, 2016
Merged

BLAS vector-vector dot#294
abergeron merged 16 commits intoTheano:masterfrom
khaotik:blas_dot

Conversation

@khaotik
Copy link
Copy Markdown
Contributor

@khaotik khaotik commented Nov 24, 2016

For #292.

Comment thread src/gpuarray/blas.h Outdated
#endif

// only for vector-vector dot
GPUARRAY_PUBLIC int GpuArray_dot( GpuArray *A, GpuArray *B,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You declare _dot and then use _rdot. That might not work.

ctx->err = cuModuleLoadData(&res->m, bin);
// for both info/err log
cujit_info_log = (char*)malloc(2*cujit_log_size*sizeof(char));
if(cujit_info_log == NULL) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after if.


ctx->err = cuModuleLoadData(&res->m, bin);
// for both info/err log
cujit_info_log = (char*)malloc(2*cujit_log_size*sizeof(char));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cast the return of malloc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more friendly to possible inclusion from C++ code, but I'll remove it anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headers should be c++ clean, but the source files will probably not be touched by a c++ compiler.

@abergeron
Copy link
Copy Markdown
Member

This PR seems to be more about using cuModuleLoadDataEx than blas dot.

The blas parts should either be completed or removed.

@abergeron
Copy link
Copy Markdown
Member

Ok sorry, I just realized that you just reused the same branch for this PR, so the commits from the other one are in here. You should probably rebase on the current master.

Comment thread src/gpuarray_array_blas.c Outdated
if (!(X->flags & GA_ALIGNED) || !(Y->flags & GA_ALIGNED) ||
!(Z->flags & GA_ALIGNED))
return GA_UNALIGNED_ERROR;
if (X->dimensions[0] != n || Y->dimensions[0] != n)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't set n before this. Also you could just compare them to each other.

Comment thread src/gpuarray_array_blas.c Outdated
Yp = &copyY;
}
}
if (Z->strides[0] < 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Z doesn't have strides, it's a scalar.

Comment thread src/gpuarray_blas_cuda_cublas.c Outdated

// we should store dot result on device
cublasGetPointerMode(h->h, &pmode);
cublasSetPointerMode(h->h, CUBLAS_POINTER_MODE_HOST);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to set the mode to DEVICE here.

Comment thread src/gpuarray_blas_cuda_cublas.c Outdated
size_t N,
gpudata *X, size_t offX, size_t incX,
gpudata *Y, size_t offY, size_t incY,
gpudata *Z) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an offset for Z.

Comment thread src/gpuarray_blas_cuda_cublas.c Outdated

GA_CUDA_EXIT_ON_ERROR(ctx, cuda_wait(X, CUDA_WAIT_READ));
GA_CUDA_EXIT_ON_ERROR(ctx, cuda_wait(Y, CUDA_WAIT_READ));
GA_CUDA_EXIT_ON_ERROR(ctx, cuda_wait(Z, CUDA_WAIT_ALL));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only wait for write on Z since it is not read by dot.

@abergeron
Copy link
Copy Markdown
Member

Also, please get rid of the merge commit and do a proper rebase.

@abergeron
Copy link
Copy Markdown
Member

Just to note that you still have a merge commit. Just a different one now.

After a rebase, when you want to push to github, git will tell you to run git pull. This is wrong and you should run git push -f instead.

Otherwise, the code progresses nicely. Good job!

@abergeron abergeron dismissed their stale review November 25, 2016 16:57

Lots of changes

Plus some minor changes:

- did `chmod +x setup.py`
- added interface for clblas
Plus some minor changes:

- did `chmod +x setup.py`
- added interface for clblas
 - Added tests for BLAS dot
 - Implementation for CLBlast
 - modified blas tests from using nested for loops to itertools.product for
parametrized tests.
@khaotik khaotik changed the title [WIP] BLAS vector-vector dot BLAS vector-vector dot Nov 27, 2016
@khaotik
Copy link
Copy Markdown
Contributor Author

khaotik commented Nov 27, 2016

@abergeron

Mostly done, just a few problems.

Changes

  • BLAS dot for libgpuarray, with CUDA/clBLAS/clBLAST backend
    clblasSdot need a working buffer of size N, probably for sum reduction. I just made a naive implementation (allocate then release). I was about to make a static buffer that is shared between calls, however it's not thread safe. For now, there's some overhead with the naive impl.
  • pygpu binding and tests
  • changed all strides (inc* arguments) to type int for potential negative strides
    I did this because it's specified in BLAS standard, and is supported in some CPU BLAS libs. Not sure if these GPU libraries would change in future.

Comment thread src/gpuarray_blas_opencl_clblas.c Outdated
scratch_mem = clCreateBuffer(
ctx->ctx, CL_MEM_READ_WRITE, N*sizeof(float), NULL, &cl_err);
if (cl_err != CL_SUCCESS)
return GA_MEMORY_ERROR;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temporary buffer allocation is ok, however please use cl_alloc() to allocate it so we can have one central location for all allocations.

Comment thread src/gpuarray_blas_opencl_clblast.c Outdated
error,
hdot, /* TODO */
sdot, /* TODO */
ddot, /* TODO */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not TODO.

Comment thread src/private.h Outdated

int (*hdot)( size_t N,
gpudata *X, size_t offX, int incX,
gpudata *Y, size_t offY, int incY,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use int, use ssize_t for a signed type. We don't want to limit the size of arrays at this level.

Copy link
Copy Markdown
Contributor Author

@khaotik khaotik Nov 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems some BLAS functions are using int as stride (such as *gemv *ger) while others are using size_t. For now I just made a revert to size_t. I'm experimenting with ssize_t however getting test errors. Still need some work to finish. I think it's ready for jenkins test if no other issues. I'll leave stride problem to another PR.

Comment thread src/gpuarray_blas_cuda_cublas.c Outdated
" const float *x[], size_t incx, " \
" float *y[], size_t incy, " \
" const float *x[], int incx, " \
" float *y[], int incy, " \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you changed the data type in the kernel without changing the declared data type. This will break badly.

@abergeron
Copy link
Copy Markdown
Member

jenkins test this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants